Overview

The purpose of this project is to investigate the provision of timely and effective emergency department care in Pennsylvania. The aims of the project are:
+ To compare emergency department access and ambulatory care metrics in Pennsylvania to that of other states and to national averages.
+ To compare emergency department access and ambulatory care metrics within Pennsylvania at the county level.
+ To investigate if there are any relationships between socioeconomic and demographic variables and access and care metrics using multivariate regressions.

Introduction

Access to health care means having “the timely use of personal health services to achieve the best health outcomes” (IOM, 1993). Timely and effective ambulatory care is an important cornerstone of healthcare access and is essential for good patient outcomes and satisfaction. This is especially true in the US, where the cost of healthcare is high and ever increasing, but healthcare outcomes remain poor, especially for a developed nation. It is important that care not only be evidence-based and clinically indicated, but also timely and prompt, with delivery of the “right care at the right time.”

Ambulatory care includes primary care, outpatient care and emergency room care. Ideally, patients can use primary care and outpatient for usual care, and only visit the ED for emergent conditions. Unfortunately, given the cost of healthcare and differential access to quality insurance, many patients use the ED as a ready alternative to their usual source of medical care. As such, many US adults instead visit hospital EDs with problems that could have been managed appropriately in general primary care practice.

This added strain on emergency departments across the nation has led to issues of overcrowding, long wait times and significant strain on limited resources. This is not without its repercussions, as this thereby limits the quality and timeliness of care through the ED. The frequency of ED visits is thought particularly high among people who do not have a regular source for health care or who are uninsured.

Because people are substituting the ER for regular inpatient and outpatient treatments, we need to look specifically at the ER to get a better idea of what’s happening to these groups and if ED care is meeting these needs. Thus, in this interdisciplinary project, I attempt to examine metrics of timely and effective emergency department care and how they vary in Pennsylvania.

With respect to population health, Pennsylvania falls below national averages, ranking 28th among the 50 states in 2019 according to United Health Foundation’s America’s Health Rankings. PA ranked #28 for preventable hospitalizations. As in other states across the country, measures of health status in Pennsylvania vary by geographic and socioeconomic factors which, according to the AARP Pennsylvania in 2019, are combining to restrict access to health care services in Pennsylvania. According to AARP, these structural health disparities are particularly noted among those living in rural and low-resourced areas of the state and among URM communities (Black/African American and Latino), all of whom lack access to health care and culturally appropriate, evidence-based health promoting and chronic disease management practices.

In the EM landscape, there are numerous proxies for Timely and Effective Care (TEC).
+ The goal for patients with stroke should be a door-to-head CT time within 20 minutes in ≥50% of stroke patients who may be candidates for IV tPA or mechanical thrombectomy.
+ The goal for patients with ST-elevation myocardial infarction should be a door-to-needle time within 30 minutes (for thrombolysis) and a door-to-balloon time within 90 minutes (for PCI).
+ The goal for patients with ST-elevation myocardial infarction at a non-equipped hospital to be transferred to to a specialty hospital should be a door-in-door-out time within 30 minutes.
Additionally, given the wide publicity that long wait times has attracted it has attracted (on average, >2 hours across the US), ED waiting time is also an appropriate proxy.

This is an interdisciplinary issue, as it has implications for physicians, public health advocates, epidemiologists, economists, public service advocates, etc. As such, addressing the problem of timely access to care requires a multidisciplinary approach, as I have undertaken here.This project was supported by Dr. Gary Weissman, MD, MSHP (Assistant Professor of Medicine and Senior Fellow Leonard Davies Institute) and Dr. John Holmes PhD, FACE, FACMI (Professor of Medical Informatics in Epidemiology, Associate Director of the Institute for Biomedical Informatics), who both provided invaluable advice and guidance for mapping the data and assumptions for service areas.

Data Sources

Major data sources used in this project are listed below. These sources are all publicly available.
+ “TEC data”: Timely and Effective Care (TEC) data from CMS through 2020. Contains national, county-level and hospital-level data.
+ “Census data”: US Census Bureau data available in the tidycensus R package. Using acs5 survey 5-year data ending 2019.

Methods

First I investigated the TEC data: I visualized how TEC metrics vary in Pennsylvania as compared to other states regionally and nationally. Then, I visualized how TEC metrics. Then, I visualized variations of TEC metrics in Pennsylvania at the county-level using maps, highlighting the two most populous counties of Philadelphia County (with capital city of Philadelphia) and Alleghany County (with second major city of Pittsburgh).

Next, I investigated the socioeconomic and demographic data, visualizing variations by county in PA as they vary across PA by county.

Finally, I investigated the relationships between the TEC data and socioeconomic data using multivariate regression. In order to do this, I needed to overcome the problem that the TEC data is at the hospital level and the socioeconomic/demographic data is at the county level by converting both to zip code level data. For the TEC data, I used the hospital addresses calculate catchment/service areas around each hospital ED, assuming a 5-mile radius around hospitals based on zip code.

Preparing packages and data

All necessary packages are installed and loaded, and necessary keys are registered.

## Loading required package: knitr

Functions necessary for later processing are created.

# Theme for maps
my_theme <- function() {
  theme_minimal() +                                  
  theme(
        axis.line = element_blank(),                 
        axis.text = element_blank(),                 
        axis.title = element_blank(),
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.5, "cm"),          
        legend.text = element_text(size = 8),       
        legend.title = element_text(size = 10),
        plot.title = element_text(size = 14),
                                  #hjust = 0.5),
       #panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
       #panel.grid.minor = element_blank(),
       #plot.background = element_rect(fill = "#f5f5f2", color = NA), 
       #panel.background = element_rect(fill = "#f5f5f2", color = NA), 
       #legend.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.border = element_blank())
}

# Min for map scales
prev_min <- function(x) {
  min(c(x),na.rm=TRUE)
}

# Max for map scales
prev_max <- function(x) {
  max(c(x), na.rm=TRUE)
}

# Theme for bar graphs
my_theme_bar <- function() {
  theme_minimal() +                                  
  theme(
        axis.line = element_blank(),                 
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.5, "cm"),          
        legend.text = element_text(size = 8),       
        legend.title = element_text(size = 10),
        plot.title = element_text(size = 14),
       panel.border = element_blank())
}

# Percent function
percent <- function(x, format = "f"){ 
  paste0(formatC(x, format = format, digits = 2), "%")
}

# Function to extract legend for shared graphs
g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

Necessary inputs are downloaded and/or read.
+ TEC data is downloaded directly from the CMS website.
+ Counties map data from the tigris package.
+ 2019 County population data downloaded from US Census Bureau County Population Totals 2010-2019.

# TEC data
tec_hosp <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/350f34f9ef3d484925d49dfcce7a0f54_1632355520/Timely_and_Effective_Care-Hospital.csv")
tec_state <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/0b2db52e1ce72942f369cc00c00649f8_1632355520/Timely_and_Effective_Care-State.csv")
tec_natl <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/4766dbf7fd2de19618a27f5546954463_1632355520/Timely_and_Effective_Care-National.csv")

# PA counties data
pa_counties<-tigris::counties(state = "42", cb = TRUE, progress_bar = FALSE) #FIPS 42 is Pennsylvania

# PA counties population data
pa_pop<- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/co-est2019-annres-42.xlsx",range="A6:M72",
   col_names = c("County","Census","Estimate Base",
                 "pop_2010",
                 "pop_2011",
                 "pop_2012",
                 "pop_2013",
                 "pop_2014",
                 "pop_2015",
                 "pop_2016",
                 "pop_2017",
                 "pop_2018",
                 "pop_2019"))


# Cleaning data
tec_state$NAME <- state.name[match(tec_state$State,state.abb)]
tec_state$Score<- as.numeric(tec_state$Score)

pa_pop$County<- gsub('^\\.|\\.$', '', pa_pop$County) 
pa_pop$NAME<- gsub(" County[,%] Pennsylvania$","", pa_pop$County)

Visualizing national data

First, national data is visualized using us_states dataset provided by the spData package. Note: ‘total_pop_15’ is the numerical vector of total population in 2015. For simplicity only the contiguous United States is considered.

Metrics investigated are:
+ OP_18b: Average (median) time patients spent in the ED
+ OP_18c: Average (median) time patients spent in the ED (Psychiatric/Mental Health Patients)
+ OP_2: Percentage of outpatients with chest pain or possible heart attack who got drugs to break up blood clots within 30 minutes of arrival
+ OP_3b: Percentage of patients who came to the ED with stroke symptoms who received brain scan results within 45 minutes of arrival
+ OP_23: Average (median) number of minutes before outpatients with chest pain or possible heart attack who needed specialized care were transferred to another hospital

# Select variables to investigate
varlist <- c("OP_18b","OP_18c","OP_2","OP_23","OP_3b")

state_vars <- tec_state %>% 
   dplyr::select(NAME,State,Measure.ID,Score) %>%
   filter(Measure.ID %in% varlist) %>% 
   drop_na() %>%
   reshape(idvar = c("State","NAME"), timevar = "Measure.ID", direction = "wide", varying=varlist)

# Merge to usa map data
usa <- us_states
to_map_natl<- merge(usa,state_vars, stringsAsFactors=FALSE)
str(to_map_natl) #check
## Classes 'sf' and 'data.frame':   48 obs. of  13 variables:
##  $ NAME        : chr  "Alabama" "Arizona" "Arkansas" "California" ...
##  $ GEOID       : chr  "01" "04" "05" "06" ...
##  $ REGION      : Factor w/ 4 levels "Norteast","Midwest",..: 3 4 3 4 4 1 3 3 3 4 ...
##  $ AREA        : Units: [km^2] num  133709 295281 137690 409747 269573 ...
##  $ total_pop_10: num  4712651 6246816 2872684 36637290 4887061 ...
##  $ total_pop_15: num  4830620 6641928 2958208 38421464 5278906 ...
##  $ State       : chr  "AL" "AZ" "AR" "CA" ...
##  $ OP_18b      : num  139 174 126 163 140 163 194 152 145 129 ...
##  $ OP_18c      : num  213 291 201 269 234 278 300 225 289 181 ...
##  $ OP_2        : num  51 69 38 48 60 33 NA 54 39 33 ...
##  $ OP_23       : num  62 72 73 73 73 75 68 75 70 62 ...
##  $ OP_3b       : num  74 71 62 78 56 64 NA 56 72 62 ...
##  $ geometry    :sfc_MULTIPOLYGON of length 48; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:51, 1:2] -88.2 -88.2 -87.4 -86.9 -85.6 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:12] "NAME" "GEOID" "REGION" "AREA" ...
to_map_northeast<- to_map_natl %>% filter(REGION=="Norteast")
to_map_pa <- to_map_natl %>% filter(NAME=="Pennsylvania")

#CANNOT GET THIS TO WORK
# for (i in varlist){
#    p <- ggplot() +
#       geom_sf(data = to_map_natl, aes(fill = i), lwd=0, colour="white") +
#       coord_sf() +
#       my_theme() +
#       scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
#                          limit = range(prev_min(to_map_natl[[i]]), prev_max(to_map_natl[[i]]))) +
#       labs(title= "Average (median) time patients spent in the ED",
#         subtitle= "July - December 2020",
#         caption = "Source: Centers for Disease Control") +
#       theme(legend.position = "bottom")
#    print(p)
# }


# Plot maps
#OP_18b
map_natl_OP_18b<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_18b), lwd=0, colour="white") + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_18b), prev_max(to_map_natl$OP_18b))) +
   labs(title= "Fig. 1: Average (median) time patients spent in the ED",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslvania median = 156 mins \nUS national median = 148 mins"), stat = "unique")

# map_natl_OP_18b_NE<-ggplot(to_map_northeast) + 
#    geom_sf(aes(fill = OP_18b), lwd=0, colour="white") + 
#    geom_sf_label(aes(label = State)) +
#    coord_sf() + 
#    my_theme() + 
#    scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
#                          limit = range(prev_min(to_map_natl$OP_18b), prev_max(to_map_natl$OP_18b))) +
#    labs(subtitle= "Northeast") +
#    theme(legend.position = "none")
# 
# lay <- rbind(c(1,1,1,NA),
#              c(1,1,1,2),
#              c(1,1,1,2))
# grid.arrange(map_natl_OP_18b, map_natl_OP_18b_NE, layout_matrix = lay)


map_natl_OP_18c<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_18c), lwd=0) + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_18c), prev_max(to_map_natl$OP_18c))) +
   labs(title= "Fig. 2: Average (median) time patients spent in the ED \n(Psychiatric/Mental Health Patients)",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 263 mins \nUS national median = 248 mins"), stat = "unique")

map_natl_OP_2<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_2), lwd=0) + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +   
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "magma", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_2), prev_max(to_map_natl$OP_2))) +
   labs(title= "Fig. 3: Percentage of outpatients with chest pain or possible heart \nattack who got drugs to break up blood clots within 30 minutes of arrival",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 32% \nUS national median = 52%"), stat = "unique")   

map_natl_OP_23<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_23), lwd=0) +
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "cividis", direction = 1,
                         limit = range(prev_min(to_map_natl$OP_23), prev_max(to_map_natl$OP_23))) + 
   labs(title= "Fig. 4: Percentage of patients who came to the ED with stroke symptoms \nwho received brain scan results within 45 minutes of arrival",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 65% \nUS national median = 72%"), stat = "unique")  

map_natl_OP_3b<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_3b), lwd=0) +
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "magma", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_3b), prev_max(to_map_natl$OP_3b))) +
   labs(title= "Fig. 5: Average (median) number of minutes before outpatients with chest pain \nor possible heart attack who needed specialized care were transferred to \nanother hospital",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 76 mins \nUS national median = 61 mins"), stat = "unique")  

map_natl_OP_18b

map_natl_OP_18c

map_natl_OP_2

map_natl_OP_23

map_natl_OP_3b

Comparing PA to regional Northeast data

Next, median metrics of Pennsylvania are compared to those of other states in the Northeast.
Note: Since only the final median values and not the underlying data, I cannot perform any statistical comparisons.

# Determine averages
fg_northeast<- as.data.frame(to_map_natl) %>% 
   filter(REGION=="Norteast") %>%
   dplyr::select(NAME,OP_18b,OP_18c,OP_2,OP_23,OP_3b)

# Reshape for barchart
forgraph_northeast <- melt(setDT(fg_northeast), id.vars = c("NAME"), variable.name = "score")

# fg_ne_sum<-summarize_all(fg_northeast,mean,na.rm = TRUE) %>% mutate(NAME="Northeast States")
# 
# fg_natl <- tec_natl %>% 
#    dplyr::select(Measure.ID,Score) %>%
#    mutate(NAME="US National") %>%
#    filter(Measure.ID %in% varlist) %>% 
#    reshape(idvar="NAME", timevar = "Measure.ID", direction = "wide", varying=varlist)
#forgraph_all <- rbind(fg_northeast,fg_ne_sum)


# Plot graphs
bar_OP_18b18c <- forgraph_northeast %>% 
   filter(variable=="OP_18b" | variable=="OP_18c") %>%
   ggplot(aes(x=reorder(factor(NAME),value),  y=value, fill=variable)) + 
   geom_bar(stat="identity", colour = "black", position = "stack") + 
   my_theme_bar() + 
   theme(axis.title.y=element_blank()) +
   geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
   scale_y_continuous(name="Median Time (mins)") + 
   coord_flip() + 
   labs(title= "Fig. 6: Average (median) time patients spent in the ED",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   scale_fill_brewer(palette = "Set1",
                     breaks=c("OP_18b", "OP_18c"),
                     labels=c("All Patients", "Psychiatric/Mental Health Patients"))  + 
   theme(legend.title=element_blank())

bar_OP_2_23 <- forgraph_northeast %>% 
   filter(variable=="OP_2" | variable=="OP_23") %>%
   ggplot(aes(x=reorder(factor(NAME),value),  y=value, fill=variable)) + 
   geom_bar(stat="identity", colour = "black", position = "stack") + 
   my_theme_bar() + 
   theme(axis.title.x=element_blank()) +
   geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
   scale_y_continuous(name="% of patients") + 
   labs(title="Fig. 7: Percent of ED Patients Presenting with Acute Symptoms who \nReceived Timely Care ",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   scale_fill_brewer(palette = "Accent",
                     breaks=c("OP_2", "OP_23"),
                     labels=c("% pts with chest pain or possible heart attack \nwho got drugs to break up blood clots within 30 minutes of arrival", "% pts with stroke symptoms who received \nbrain scan results within 45 minutes of arrival"))  + 
   theme(legend.title=element_blank(), legend.position="bottom")


bar_OP_18b18c

bar_OP_2_23

A brief look at the wait times for the hospitals.

# Subset data to metrics of interest and PA
df_hosp <- tec_hosp %>% filter(Measure.ID %in% varlist & State=="PA")

## General hospitals
# Get list of general hospital names
df_hosp_fac <- df_hosp %>% 
   mutate(addr=paste0(Address,", ",City, ", ",State," ",ZIP.Code)) %>%
   distinct(Facility.Name,addr,Measure.ID,Score) %>% 
   arrange(Facility.Name) %>%
   reshape(idvar = c("Facility.Name","addr"), timevar = "Measure.ID", direction = "wide", varying=varlist)

df_hosp_fac$OP_18b <- as.numeric(df_hosp_fac$OP_18b)
df_hosp_fac$OP_18c <- as.numeric(df_hosp_fac$OP_18c)
df_hosp_fac$OP_2 <- as.numeric(df_hosp_fac$OP_2)
df_hosp_fac$OP_23 <- as.numeric(df_hosp_fac$OP_23)
df_hosp_fac$OP_3b <- as.numeric(df_hosp_fac$OP_3b)

df_hosp_fac$addr <- gsub('34TH ST &', '3400', df_hosp_fac$addr)
df_hosp_fac$addr <- gsub('34TH &', '3400', df_hosp_fac$addr)

# Format table
dt_pahosp <- df_hosp_fac %>% dplyr::select(Facility.Name,OP_18b,OP_18c) %>% filter(! is.na(OP_18b) | ! is.na(OP_18c)) 
dt_pahosp$Facility.Name<-str_to_title(dt_pahosp$Facility.Name) 
dt_pahosp$OP_18b_time <- dt_pahosp$OP_18b %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp$OP_18c_time <- dt_pahosp$OP_18c %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")

dt_pahosp <- dt_pahosp %>% dplyr::rename('Hospital'=Facility.Name,
                                 'Time Spent in ED - All Patients'=OP_18b_time,
                                 'Time Spent in ED - Mental Health Patients'=OP_18c_time) 

# Calculate means
dt_pahosp_means <- dt_pahosp %>% dplyr::summarize(mean_wait_all=mean(OP_18b, na.rm = TRUE),
                                                  mean_wait_bhv=mean(OP_18c, na.rm = TRUE))

dt_pahosp_means$`All Patients` <- dt_pahosp_means$mean_wait_all %>% 
   hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means$`Mental Health Patients` <- dt_pahosp_means$mean_wait_bhv %>% 
   hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means <- dt_pahosp_means %>% dplyr::select(-mean_wait_all,-mean_wait_bhv) %>% gather()


# Final tables
head(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
   dplyr::select(-OP_18b,-OP_18c) %>%
   kable(caption="Table 1. Longest Patient Wait Times at PA Hospitals (Top 10)", 
        align="lcc",
        row.names = FALSE) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 1. Longest Patient Wait Times at PA Hospitals (Top 10)
Hospital Time Spent in ED - All Patients Time Spent in ED - Mental Health Patients
Jefferson Health- Northeast 06:26 NA
Milton S Hershey Medical Center 04:54 08:24
Hospital Of Univ Of Pennsylvania 04:11 04:44
Chester County Hospital 04:05 10:08
Penn Presbyterian Medical Center 03:57 04:34
Robert Packer Hospital 03:44 06:07
Upmc Memorial 03:44 04:24
Penn State Health St. Joseph 03:42 04:53
Main Line Hospital Lankenau 03:38 NA
Einstein Medical Center Montgomery 03:37 NA
tail(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
   dplyr::select(-OP_18b,-OP_18c) %>%
   kable(caption="Table 2. Shortest Patient Wait Times at PA Hospitals (Bottom 10)", 
        align="lcc",
        row.names = FALSE) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 2. Shortest Patient Wait Times at PA Hospitals (Bottom 10)
Hospital Time Spent in ED - All Patients Time Spent in ED - Mental Health Patients
Millcreek Community Hospital 01:36 03:28
Ahn Emerus Westmoreland, Llc 01:31 NA
Penn Highlands Tyrone 01:31 NA
Highlands Hospital 01:30 04:05
Upmc Kane 01:30 03:02
Mercy Catholic Medical Center- Mercy Fitzgerald 01:28 04:44
Lecom Health Corry Memorial Hospital 01:24 NA
Grove City Medical Center 01:22 NA
Bucktail Medical Center 01:19 NA
Conemaugh Meyersdale Medical Center 01:08 NA
dt_pahosp_means %>%
  kable(caption="Table 3. Mean Patient Wait Times at PA Hospitals ", 
        align="lr",
        row.names = FALSE,
        col.names=NULL) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 3. Mean Patient Wait Times at PA Hospitals
All Patients 02:40
Mental Health Patients 05:03

Investigating Access to Hospitals

Next, I explore the availability of hospitals across Pennsylvania. I will map hospital densities using the hospital data from TEC.I will map all hospitals, then hospitals across a few key service lines, namely: pediatric hospitals, trauma centers and stroke centers.

Using the package ggmap, I will get latitude and longitude data for each hospital address in the TEC data using the Google API. To do this, I registered for a free Google console account, generated an API key that I registered in R Studio, and enabled the following APIs on the Google Console:
+ Directions API
+ Geocoding API
+ Geolocation API
+ Places API
+ Maps Javascript AVI
+ Maps Embed API

I then converted the data frame of resultant cartesion coordinates to an sf object. I used the the projection WGS84, which is the CRS code #4326, a priori defined in the sf object. I mapped the hospitals onto a map of Pennsylvania using the counties dataset tigris library, overlayed with county population data.

## All hospitals
   # getting map coordinates (lat and long) of hospital addresses 
   for(i in 1:nrow(df_hosp_fac))
   {
       ifelse(is.na(df_hosp_fac$addr[i]), NA,      
       hosp_ggmap_pa <- geocode(df_hosp_fac$addr[i], 
                                output = "more", 
                                source = "google", 
                                messaging = FALSE))
             df_hosp_fac$Longitude[i] <- as.numeric(hosp_ggmap_pa[1])
             df_hosp_fac$Latitude[i] <- as.numeric(hosp_ggmap_pa[2])
   }
   
   df_hosp_fac_tib <- as_tibble(df_hosp_fac)
   df_hosp_fac_sf <- st_as_sf(df_hosp_fac_tib, coords = c("Longitude", "Latitude"), crs = 4326)
   mapview(df_hosp_fac_sf) #Quickly checking addresses geocoded correctly
   # merge to PA county data to PA population data
   pa_tomap<-merge(pa_counties, pa_pop, by="NAME")
   class(pa_tomap) #check
## [1] "sf"         "data.frame"
## Specialty hospitals 
varlist_hosp <- c("Childrens","Stroke","Trauma")

for (var in varlist_hosp){
   
   # Initialize table
   tablet <- as.character(paste0("list_pa_",var))
   
   # Import
   list <- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/Specialty Hospitals.xlsx", sheet=var)
   
   # Getting map coordinates (lat and long) of specialty hospital addresses 
   list2 <- cbind(list, 
                  type_hosp=var,
                  geocode(list$Hospital, 
                          output = "more", 
                          source = "google", 
                          messaging = FALSE))
   
   list_tib <- as_tibble(list2)
   list_sf <- st_as_sf(list_tib, coords = c("lon", "lat"), crs = 4326)
   
   # Assign final table
   assign(tablet,list_sf)
   
}

# checking addresses geocoded correctly
# mapview(list_pa_Childrens)
# mapview(list_pa_Stroke)
# mapview(list_pa_Trauma)

spec_sf <- st_as_sf(rbind.fill(list_pa_Childrens,list_pa_Stroke,list_pa_Trauma)) %>% dplyr::select(-'...3')
class(spec_sf)
## [1] "sf"         "data.frame"
# Creating annotations for graphs
high_pa_tomap <- pa_tomap %>% filter(pop_2019 > 1000000) #labelling counties with pop > 1 million

for(i in 1:nrow(high_pa_tomap)){
    ifelse(is.na(high_pa_tomap$County[i]), NA,      
    high_pa_tomap2 <- geocode(high_pa_tomap$County[i], output = "latlon", source = "google", messaging = FALSE))
          high_pa_tomap$lon[i] <- as.numeric(high_pa_tomap2[1])
          high_pa_tomap$lat[i] <- as.numeric(high_pa_tomap2[2])
}

all_hosp_sf <- st_as_sf(rbind.fill(spec_sf,df_hosp_fac_sf) %>% 
   mutate(type_hosp = ifelse(is.na(type_hosp),"All Hospitals",type_hosp)))


# Maps

ggplot() + 
   geom_sf(data = pa_tomap, aes(fill = pop_2019/100000), color="white") + 
   geom_sf(data = df_hosp_fac_sf, size = 2, shape = 21, fill = "gold") +
   geom_sf_label_repel(data = high_pa_tomap, aes(x=lon,y=lat,label=NAME),
                       size=3, force=7, nudge_y=-6, nudge_x=1, seed = 20,
                       alpha=0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Population (in 100,000s)", option = "mako", direction = -1,
                         limit = range(prev_min(pa_tomap$pop_2019/100000), 
                                       prev_max(pa_tomap$pop_2019/100000))) +
   labs(title= "Fig. 8: Pennsylvania Specialty Service Line Hospitals",
        caption = "Source: Centers for Disease Control, US Census Bureau (2019 Census).") +
   theme(legend.justification = c(0, 1),legend.position = "bottom")

ggplot() + 
   geom_sf(data = pa_tomap, aes(fill = pop_2019/100000), color="white") + 
   geom_sf(data = spec_sf, size = 2, shape = 21, fill = "gold") +
   # geom_sf_label_repel(data = high_pa_tomap, aes(x=lon,y=lat,label=NAME), 
   #                     size=3, force=7, nudge_y=-6, nudge_x=1, seed = 20,
   #                     alpha=0.8) +
   facet_wrap(~type_hosp, shrink=FALSE, dir="v", nrow=1) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Population (in 100,000s)", option = "mako", direction = -1,
                         limit = range(prev_min(pa_tomap$pop_2019/100000), 
                                       prev_max(pa_tomap$pop_2019/100000))) +
   labs(title= "Fig. 9: Pennsylvania Specialty Service Line Hospitals",
        caption = "Source: Centers for Disease Control, US Census Bureau (2019 Census).") +
   theme(legend.justification = c(0, 1),
         legend.position = "bottom",
         legend.text = element_text(size = 6))

Investigating Socioeconomic/Demographic Differences by County

Next, I will investigate different key socioeconomic/demographic factors on average and by county in PA using the tidycensus package and 5 year ACS data ending in 2019. Key factors investigated are:
+ Poverty rate (DP03_0119PE)
+ Median household income (DP03_0062E)
+ Percentage population 25 years and over with high school grad or higher (DP02_0066PE)
+ Percentage population without health insurance coverage (DP03_0099PE)
+ Percentage population with public health insurance coverage (DP03_0098PE)
+ Percentage population with private health insurance (DP03_0097PE)
+ Percentage of population White (DP05_0037PE)
+ Percentage of population Asian (DP05_0044PE)
+ Percentage of population Black or African American (DP05_0038PE)
+ Percentage of population Hispanic or Latino (DP05_0078PE)
+ Percentage of population American Indian and Alaska Native (DP05_0039PE)

# Obtaining 2019 acs5 variable list to investigate
dp17 <- load_variables(2019, "acs5/profile", cache = TRUE)

# Obtaining COUNTY-LEVEL socioeconomic/demographic data
pa_factors <- get_acs(geography = "county", 
              variables = c(povertyrate = "DP03_0119P",
                            medincome = "DP03_0062",
                            edurate = "DP02_0067P",
                            unemploymentrate = "DP03_0009P",
                            nohealthins = "DP03_0099P",
                            pubhealthins = "DP03_0098P",
                            privhealthins = "DP03_0097P",
                            racewhite = "DP05_0077P",
                            raceasian = "DP05_0080P",
                            raceblack = "DP05_0078P",
                            raceamerind = "DP05_0079P",
                            racehawaii = "DP05_0081P",
                            racehispanic = "DP05_0071P",
                            raceother = "DP05_0082P"),
              state = 42, #PA is 42 
              year = 2019,
              geometry=TRUE)

unique(pa_factors$variable)
##  [1] "edurate"          "unemploymentrate" "medincome"        "privhealthins"    "pubhealthins"     "nohealthins"      "povertyrate"      "racehispanic"     "racewhite"        "raceblack"        "raceamerind"      "raceasian"        "racehawaii"       "raceother"
tableout <- data.frame(cbind(unique(pa_factors$variable),
                  c("Percentage of Population 25y+ with High Diploma or Equivalency",
                    "Unemployment Rate (%)",
                    "Median Household Income (in millions of USD)",
                    "Percentage of population with private health insurance",
                    "Percentage of population with public health insurance coverage",
                    "Percentage of population without health insurance coverage ",
                    "Poverty Rate (%)",
                    "Race -- % Hispanic or Latino",
                    "Race -- % White", 
                    "Race -- % Black or African American",
                    "Race -- % American Indian and Alaska Native",
                    "Race -- % Asian", 
                    "Race -- % Native Hawaiian and Other Pacific Islander",
                    "Race -- % Other Race")))

table_soc <- data.frame(pa_factors) %>% 
   group_by(variable) %>%
   dplyr::select(NAME,variable, estimate) %>% 
   mutate(estimate = as.numeric(estimate),
          variable = as.character(variable)) %>%
   dplyr::summarize(mean_estimate=mean(estimate, na.rm = TRUE)) %>%
   mutate(mean_estimate = ifelse(variable %in% c("medincome"), 
                             scales::dollar(mean_estimate), 
                             percent(mean_estimate))) %>%
   left_join(tableout,by=c("variable"="X1")) %>%
   dplyr::rename(`Socioeconomic/Demographic Factor`=X2,
                 `Mean Value`=mean_estimate)
   
table_soc %>% dplyr::select(`Socioeconomic/Demographic Factor`,`Mean Value`) %>%
  kable(caption="Table 4. Socioeconomic and Demographic Factors Across PA Counties", 
        align="lr", 
        digits = 3,  
        col.names=c("Factor","Mean Value"))  %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 4. Socioeconomic and Demographic Factors Across PA Counties
Factor Mean Value
Percentage of Population 25y+ with High Diploma or Equivalency 89.89%
Median Household Income (in millions of USD) $57,056.09
Percentage of population without health insurance coverage 5.91%
Poverty Rate (%) 8.31%
Percentage of population with private health insurance 72.23%
Percentage of population with public health insurance coverage 39.20%
Race – % American Indian and Alaska Native 0.10%
Race – % Asian 1.50%
Race – % Black or African American 4.44%
Race – % Native Hawaiian and Other Pacific Islander 0.01%
Race – % Hispanic or Latino 4.46%
Race – % Other Race 0.09%
Race – % White 87.87%
Unemployment Rate (%) 5.04%

Mapping socioeconomic/demographic variables.

# Mapping variables

tableout_econ <- tableout %>% filter(X1 %in% c("edurate","unemploymentrate","povertyrate","medincome"))

counter = 0       
for(i in unique(pa_factors$variable)){
   # dynamic figure labelling
   counter <- counter + 1
   figletter <- toupper(letters[counter])
   
   # dynamic titling
   tableout2 <- tableout_econ %>% filter(X1==i)
   titleout <- as.character(paste0("Fig. 9",figletter,". Pennsylvania ",tableout2$X2," by County"))
   
   # dynamic annotations
   anno_soc <- table_soc %>% filter(variable==i)
   meanlabel <- 
      paste0("Calculated Estimated Mean in PA: ",anno_soc$`Mean Value`)

   # subset data for variable of interest
   spec_table <- pa_factors %>% filter(variable==i)
   var_pa_tomap <- spec_table %>% 
      filter(NAME=="Allegheny County, Pennsylvania" | NAME=="Philadelphia County, Pennsylvania") %>%
      mutate(label=paste0(gsub(' County, Pennsylvania','',NAME),": ",estimate)) %>% 
      st_join(high_pa_tomap)

   # final maps
   demo_map <- ggplot() + 
      geom_sf(data = spec_table, aes(fill = estimate), color="white") + 
      # geom_sf(data = all_hosp_sf, size = 2, shape = 21, fill = "gold") +
      geom_sf_label_repel(data = var_pa_tomap, aes(x=lon,y=lat,label=label),
                          size=3, force=7, nudge_y=-5, nudge_x=1, seed = 10,
                          fill = alpha(c("white"),0.8)) +
      coord_sf() + 
      my_theme() + 
      scale_fill_viridis(discrete=FALSE, option = "rocket", direction = -1,
                            limit = range(prev_min(spec_table$estimate),
                                          prev_max(spec_table$estimate))) +
      labs(title = paste0(strwrap(titleout,width=50),collapse="\n"),
           caption = "Source: US Census Bureau (2019 Census).",
           subtitle = paste0(strwrap(meanlabel,width=50),collapse="\n")) +
      theme(legend.justification = c(0, 1),
            legend.position = "bottom")
            # plot.tag.position = c(),
            # plot.tag = element_text(vjust = 1, hjust = 1, size=8))
   print(demo_map)
}

Better visualizing racial makeup and health insurance variation by charts.

# Race
pa_factors_race <- pa_factors %>% data.frame() %>%
   filter(grepl("race", variable, fixed = TRUE)) %>%
   dplyr::select(NAME, variable, estimate) %>%
   mutate(County = paste0(gsub(' County, Pennsylvania','',NAME)))

pa_factors_race_wide <- pa_factors_race %>% 
   dplyr::select(-NAME) %>%
   reshape(idvar = c("County"), timevar = "variable", direction = "wide") %>%
   dplyr::arrange(desc(estimate.racewhite), .by_group = TRUE) %>%
   dplyr::rename(black=estimate.raceblack, 
                 white=estimate.racewhite,
                 asian=estimate.raceasian,
                 amerind=estimate.raceamerind,
                 hispanic=estimate.racehispanic,
                 hpislander=estimate.racehawaii,
                 other=estimate.raceother)

head_pa_race_long <- melt(head(pa_factors_race_wide, n=34), id.vars=c("County"), 
                            measure.vars=c("white","black","asian","amerind","hispanic","hpislander","other"),
    variable.name="race", value.name="percent")

tail_pa_race_long <- melt(tail(pa_factors_race_wide, n=34), id.vars=c("County"), 
                            measure.vars=c("white","black","asian","amerind","hispanic","hpislander","other"),
    variable.name="race", value.name="percent")


top10 <- head_pa_race_long %>% 
   filter(variable == "white")%>%
   arrange(-desc(value)) %>%
   bind_rows(head_pa_race_long %>% filter(variable != "white")) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=value, fill=forcats::fct_rev(variable))) +
   geom_bar(stat="identity", colour = "black") + 
   coord_flip() + 
   my_theme_bar() +
   theme(axis.title.x=element_blank()) +
   scale_y_continuous(name="% of population") + 
   labs(title="Fig. 11: Racial Composition of Pennsylvania by County",
        caption = "") +
   scale_fill_brewer(palette = "Set3", direction=-1) + 
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.title.y=element_blank())

bottom10 <- tail_pa_race_long %>% 
   filter(variable == "white")%>%
   arrange(-desc(value)) %>%
   bind_rows(tail_pa_race_long %>% filter(variable != "white")) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=value, fill=forcats::fct_rev(variable))) +
   geom_bar(stat="identity", colour = "black") + 
   coord_flip() + 
   my_theme_bar() +
   theme(axis.title.x=element_blank()) +
   scale_y_continuous(name="% of population") + 
   labs(title="",
        caption = "Source: US Census Bureau.") +
   scale_fill_brewer(palette = "Set3", direction=-1) + 
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.title.y=element_blank())

mylegend<-g_legend(top10) #shared legend

grid.arrange(arrangeGrob(top10 + theme(legend.position="none"),
                         bottom10 + theme(legend.position="none"),
                         nrow=1),
             mylegend, nrow=2,heights=c(10, 1))

# Insurance
pa_noins <- pa_factors %>% data.frame() %>%
   filter(grepl("ins", variable, fixed = TRUE)) %>%
   dplyr::select(NAME, variable, estimate) %>%
   mutate(County = paste0(gsub(' County, Pennsylvania','',NAME))) %>% 
   filter(variable == "nohealthins")

pa_noins %>% 
   arrange(-desc(estimate)) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=estimate)) +
   geom_bar(stat="identity", color="black", fill="darkorange") + 
   geom_abline(slope=0, intercept=5.9,  col = "darkgreen",lty=2) +
   geom_abline(slope=0, intercept=8.6,  col = "black",lty=2) +
   scale_y_continuous(name="% of population") +
   my_theme_bar() +
   labs(title=paste0(strwrap("Fig. 12: Percentage of Population Without Health Insurance in Pennsylvania by County",width=50),collapse="\n"),
        caption = "Surce: US Census Bureau") +
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.text.x = element_text(angle = 75,  hjust=0.75))

Linear Regressions

Next, I run multivariate analyses to see if there are any relations between the socioeconomic/demographic variables and the TEC data. As mentioned, I first convert both to zip code by using zcta.

## Obtaining ZIP-CODE TABULATION AREA LEVEL socioeconomic/demographic data
varying <- c("edurate","unemploymentrate","medincome","nohealthins","povertyrate","racehispanic","racewhite","raceblack")

zc_factors <- get_acs(geography = "zip code tabulation area", 
              variables = c(povertyrate = "DP03_0119P",
                            medincome = "DP03_0062",
                            edurate = "DP02_0067P",
                            unemploymentrate = "DP03_0009P",
                            nohealthins = "DP03_0099P",
                            racewhite = "DP05_0077P",
                            raceblack = "DP05_0078P",
                            ethhispanic = "DP05_0071P"),
              state = 42, #PA is 42 
              year = 2019) %>% 
   dplyr::select(-moe) %>%
   as.data.frame() %>%
   reshape(idvar = c("GEOID","NAME"), timevar = "variable", direction = "wide")
   
## Obtaining ZIP-CODE TABULATION AREA LEVEL TEC data for hospital wait time (most populous field)

# Calculating assumption straight line 5 mile radius from each hospital
zc_tec_OP18b <- df_hosp_fac %>% filter(!is.na(OP_18b))
lat <- as.matrix(zc_tec_OP18b$Latitude)
long <- as.matrix(zc_tec_OP18b$Longitude)

# Since search_dataset takes forever to run, I first did this code, then saved down the dataset, and for subsequent runs and knits I pulled from the save down stataset
   # zc_tec_OP18b2 <- zc_tec_OP18b %>% 
   #    rowwise() %>% 
   #    mutate(CTD = list(search_radius(Latitude,Longitude,radius=5))) %>%
   #    mutate(ziplist=list(unlist(CTD[[1]])))
   # # Converting the lists of zipcodes into individual columns for each zip code
   # zc_tec_OP18b3 <- zc_tec_OP18b2 %>%
   #    as.data.frame() %>%
   #    mutate(ziplist2 = as.character(ziplist))
   # 
   # zc_tec_OP18b_final<- setDT(zc_tec_OP18b3)[, paste0("zipcode", 1:34) := tstrsplit(ziplist2, '", "')]
   # 
   # zipnames <- zc_tec_OP18b_final %>% 
   #    dplyr:: select(starts_with("zipcode"))
   # 
   # zc_tec_OP18b_final <- zc_tec_OP18b_final %>% 
   #    mutate_at(.vars = names(zipnames), 
   #             .funs = gsub,
   #             pattern = "[^[:alnum:] ]",
   #             replacement = "\\") %>%
   #    mutate_at(.vars = names(zipnames), 
   #             .funs = gsub,
   #             pattern = "c",
   #             replacement = "\\") %>%
   #    mutate_at(.vars=names(zipnames), .funs=as.character) %>%
   #    dplyr::select(Facility.Name, addr, Longitude, Latitude, OP_18b, names(zipnames))
   # 
   # # Saving dataset because search_radius takes some time to run
   # write.csv(zc_tec_OP18b_final, "zc_tec_OP18b_final.csv", row.names=F)

zc_tec_OP18b_final<- read.csv(file = "zc_tec_OP18b_final.csv")

# Wrangling from long to wide to use for regressions
forreg_zc_OP18b <- gather(zc_tec_OP18b_final, label, zipcode, zipcode1:zipcode34, factor_key=TRUE) %>%
   dplyr::select(-label) %>%
   arrange(zipcode) %>%
   na.omit %>%
   filter(!zipcode=="harater0") %>%
   #mutate(zipcode2=ifelse(as.numeric(forreg_zc_OP18b$zipcode)<10000, paste0("0", forreg_zc_OP18b$zipcode), forreg_zc_OP18b$zipcode)) %>%
   filter(!as.numeric(zipcode)>40000) %>% #dropping ohio zipcodes
   filter(!as.numeric(zipcode)<10000) #dropping new jersey zipcodes

# Calculating mean wait time for zipcodes serviced by more than one hospital
forreg_zc_mean <- forreg_zc_OP18b %>% 
  group_by(zipcode) %>%
  summarise_at(vars(OP_18b), list(OP_18b_mean = mean))

# Lookup zcta for zipcodes using crosswalk dataset (zcta package)[https://github.com/jjchern/zcta]
zip_data <- zcta::zipzcta %>% filter(state=="PA")
zip_more <- zipcodeR::zip_code_db %>% filter(state=="PA")

forreg_zc <- forreg_zc_mean %>%
   left_join(zip_data, by = c("zipcode" = "zip")) %>%
   left_join(zc_factors, by=c("zcta" = "GEOID")) %>%
   left_join(zip_more, by=c("zipcode" = "zipcode"))

head(forreg_zc)
## # A tibble: 6 × 38
##   zipcode OP_18b_mean po_name      state.x zip_type zcta  NAME  estimate.edurate estimate.unempl… estimate.medinc… estimate.noheal… estimate.povert… estimate.ethhis… estimate.racewh… estimate.racebl… zipcode_type major_city post_office_city common_city_list county state.y   lat   lng timezone radius_in_miles area_code_list population population_dens… land_area_in_sq… water_area_in_s… housing_units
##   <chr>         <dbl> <chr>        <chr>   <chr>    <chr> <chr>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl> <chr>        <chr>      <chr>                      <blob> <chr>  <chr>   <dbl> <dbl> <chr>              <dbl>         <blob>      <int>            <dbl>            <dbl>            <dbl>         <int>
## 1 15003           135 Ambridge     PA      ZIP Cod… 15003 ZCTA…             93.2              4.1            50189              3.9              6.4              2.8             82.6             10.6 Standard     Ambridge   Ambridge, PA           <raw 33 B> Beave… PA       40.6 -80.2 Eastern            3         <raw 15 B>      11861             1816             6.53             0.37          6199
## 2 15009           163 Beaver       PA      ZIP Cod… 15009 ZCTA…             95.6              4.2            63738              2.1              6.4              2.2             94.1              1.5 Standard     Beaver     Beaver, PA             <raw 48 B> Beave… PA       40.7 -80.4 Eastern            6         <raw 22 B>      15082              615            24.5              1.4           6839
## 3 15014           182 Brackenridge PA      ZIP Cod… 15014 ZCTA…             93.2              5.8            42500              6               14.3              1               94.5              3.8 Standard     Brackenri… Brackenridge, PA       <raw 24 B> Alleg… PA       40.6 -79.7 Eastern            0.625     <raw 15 B>       3184             6264             0.51             0.04          1617
## 4 15017           159 Bridgeville  PA      ZIP Cod… 15017 ZCTA…             94.4              3.6            67058              3.2              4.5              1.1             91.3              2.1 Standard     Bridgevil… Bridgeville, PA        <raw 23 B> Alleg… PA       40.4 -80.1 Eastern            4         <raw 15 B>      16213             1244            13.0              0.05          7875
## 5 15020           175 Bunola       PA      Post Of… 15020 ZCTA…             89.1              0                 NA             13.9              0                0              100                0   PO Box       Bunola     Bunola, PA             <raw 18 B> Alleg… PA       40.2 -80.0 Eastern            0.511     <raw 15 B>        231              175             1.32             0.46           121
## 6 15022           175 Charleroi    PA      ZIP Cod… 15022 ZCTA…             93.6              5.8            48051              3.3             10.7              1.7             90.5              3.5 Standard     Charleroi  Charleroi, PA          <raw 34 B> Washi… PA       40.1 -79.9 Eastern            5         <raw 15 B>      10340              573            18.1              0.24          5391
## # … with 7 more variables: occupied_housing_units <int>, median_home_value <int>, median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>, bounds_north <dbl>, bounds_south <dbl>
# Linear Regression (lm and glm produce similar results)
regout <- lm(data = forreg_zc, OP_18b_mean ~ population + population_density + 
      estimate.edurate + estimate.unemploymentrate + estimate.povertyrate + estimate.medincome +
      estimate.nohealthins + 
      estimate.racewhite + estimate.raceblack + estimate.ethhispanic)
stargazer::stargazer(regout)
## 
## % Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
## % Date and time: Tue, Dec 07, 2021 - 21:18:18
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & \multicolumn{1}{c}{\textit{Dependent variable:}} \\ 
## \cline{2-2} 
## \\[-1.8ex] & OP\_18b\_mean \\ 
## \hline \\[-1.8ex] 
##  population & 0.0003$^{***}$ \\ 
##   & (0.0001) \\ 
##   & \\ 
##  population\_density & 0.001$^{**}$ \\ 
##   & (0.0004) \\ 
##   & \\ 
##  estimate.edurate & 0.377 \\ 
##   & (0.338) \\ 
##   & \\ 
##  estimate.unemploymentrate & $-$0.437 \\ 
##   & (0.462) \\ 
##   & \\ 
##  estimate.povertyrate & $-$0.132 \\ 
##   & (0.249) \\ 
##   & \\ 
##  estimate.medincome & 0.0001 \\ 
##   & (0.0001) \\ 
##   & \\ 
##  estimate.nohealthins & $-$0.842$^{**}$ \\ 
##   & (0.373) \\ 
##   & \\ 
##  estimate.racewhite & $-$1.466$^{***}$ \\ 
##   & (0.340) \\ 
##   & \\ 
##  estimate.raceblack & $-$1.294$^{***}$ \\ 
##   & (0.355) \\ 
##   & \\ 
##  estimate.ethhispanic & $-$0.520 \\ 
##   & (0.374) \\ 
##   & \\ 
##  Constant & 260.758$^{***}$ \\ 
##   & (48.144) \\ 
##   & \\ 
## \hline \\[-1.8ex] 
## Observations & 525 \\ 
## R$^{2}$ & 0.228 \\ 
## Adjusted R$^{2}$ & 0.213 \\ 
## Residual Std. Error & 30.514 (df = 514) \\ 
## F Statistic & 15.216$^{***}$ (df = 10; 514) \\ 
## \hline 
## \hline \\[-1.8ex] 
## \textit{Note:}  & \multicolumn{1}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
## \end{tabular} 
## \end{table}

Discussion

As shown in the above maps, patients in Pennsylvania, on average, spend more time waiting in the ED than the national average (Figures 1 and 2). Additionally, patients who go to the ED with signs of a possible heart attack or stroke, on average, were received timely care less often (Figures 3 and 4). Patients with chest pain in PA had to wait longer before transfer to a specialty hospital than the US national average (Figure 5).

Though PA underperforms nationally, the regional graphs shows that PA performs the second best in the Northeast region (behind Vermont) with respect to general ED wait time (Fig. 6), but on par with the region for timely care for acute life-threatening symptoms (chest pain, stroke symptoms, Fig. 7). problem solving.

table 1-3 table 4

As expected, the largest number of hospitals are in the most densely populated counties (Fig. 8). fig 9, figs 10

Philadelphia, is the most populous and racially and ethnically diverse county, and has a greater proportion of aging individuals living in poverty and with poor access to health care.[AARP]

Assumptions/Limitations

  1. Assumption that only pediatric hospitals see pediatric patients, which is not true. This underestimates access to pediatric care, but still exists as a proxy for access to high quality pediatric care.
  2. Assumption that only pediatric hospitals see pediatric patients, which is not true. This underestimates access to pediatric care, but still exists as a proxy for access to high quality pediatric care.
    Assumption of a 5-mile radius around each hospital with an ED, thereby assuming this is the service area for each hospital ED. This is inherently limiting given, for example, that patients are sometimes transported by helicopter for emergent care (eg. Via PennStar), and patients in rural areas may have to drive more than 5 miles. Thus, it may underestimate how far patients may travel for emergent care.
  3. Assumption that if a zipcode is covered by more than 1 hospital, the TEC metric used in the regression will be the average of those hospital’s metrics.

Acknowledgements

I would like to thank the following advisors, provided invaluable advice and guidance for data sources, mapping the data and assumptions for the regression analysis:
+ Dr. Gary Weissman, MD, MSHP (Assistant Professor of Medicine and Senior Fellow Leonard Davies Institute)
+ Dr. John Holmes, PhD, FACE, FACMI (Professor of Medical Informatics in Epidemiology, Associate Director of the Institute for Biomedical Informatics)

I would like to thank my classmates in BMIN 503 for providing inspiration as well as valuable peer-peer

References

  • America’s Health Rankings analysis of America’s Health Rankings composite measure, United Health Foundation, AmericasHealthRankings.org, Accessed 2021.
  • Ballard, MD, PhD, MSPH, FACP, David J., ed. Achieving STEEEP Health Care: Baylor Health Care System’s Quality Improvement Journey. Productivity Press, 2019.
  • Elements of Access to Health Care. Content last reviewed June 2018. Agency for Healthcare Research and Quality, Rockville, MD. Accessed at https://www.ahrq.gov/research/findings/nhqrdr/chartbooks/access/elements.html
  • Kaiser Family Foundation. “The Pennsylvania Health Care Landscape.” (2016).
  • Kingery, Kate L. “County Health Rankings & Roadmaps.” Journal of Youth Development 13.3 (2018): 259-263. Rust, George, et al. “Practical barriers to timely primary care access: impact on adult use of emergency department services.” Archives of internal medicine 168.15 (2008): 1705-1710.
  • “Disrupting Disparities in Pennsylvania: Retooling for Geographic, Racial and Ethnic Growth.” College of Nursing and Health Professions, 5 Apr. 2021.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] stargazer_5.2.2       zcta_0.0.1            zipcodeR_0.3.3        scales_1.1.1          tidycensus_1.1        ggsflabel_0.0.1       DT_0.20               tigris_1.5            ggmap_3.0.0           choroplethrMaps_1.0.1 choroplethr_3.7.0     acs_2.1.4             XML_3.99-0.8          tmap_3.3-2            spDataLarge_2.0.1     spData_2.0.1          raster_3.5-2          cowplot_1.1.1        
## [19] mapproj_1.2.7         maps_3.4.0            RColorBrewer_1.1-2    viridis_0.6.2         viridisLite_0.4.0     rgdal_1.5-27          sp_1.4-6              sf_1.0-4              ggsn_0.5.0            leaflet_2.0.4.1       mapview_2.10.0        modelsummary_0.9.4    janitor_2.1.0         gridExtra_2.3         lubridate_1.8.0       survey_4.1-1          Matrix_1.3-4          ggpubr_0.4.0         
## [37] Hmisc_4.6-0           Formula_1.2-4         survival_3.2-13       lattice_0.20-45       rgenoud_5.8-3.0       broom_0.7.10          kableExtra_1.3.4      reshape_0.8.8         data.table_1.14.2     tableone_0.13.0       plyr_1.8.6            readxl_1.3.1          forcats_0.5.1         stringr_1.4.0         dplyr_1.0.7           purrr_0.3.4           readr_2.1.0           tidyr_1.1.4          
## [55] tibble_3.1.6          ggplot2_3.3.5         tidyverse_1.3.1       rmarkdown_2.11        devtools_2.4.2        usethis_2.1.3         knitr_1.36           
## 
## loaded via a namespace (and not attached):
##   [1] pacman_0.5.1            utf8_1.2.2              tidyselect_1.1.1        RSQLite_2.2.8           htmlwidgets_1.5.4       maptools_1.1-2          munsell_0.5.0           codetools_0.2-18        units_0.7-2             withr_2.4.2             colorspace_2.0-2        tables_0.9.6            highr_0.9               uuid_1.0-3              rstudioapi_0.13         stats4_4.1.1           
##  [17] wk_0.5.0                ggsignif_0.6.3          labeling_0.4.2          RgoogleMaps_1.4.5.3     WDI_2.7.4               farver_2.1.0            bit64_4.0.5             rprojroot_2.0.2         vctrs_0.3.8             generics_0.1.1          xfun_0.28               R6_2.5.1                RJSONIO_1.3-1.6         bitops_1.0-7            cachem_1.0.6            assertthat_0.2.1       
##  [33] nnet_7.3-16             gtable_0.3.0            lwgeom_0.2-8            leafpop_0.1.0           processx_3.5.2          rlang_0.4.12            systemfonts_1.0.2       splines_4.1.1           rstatix_0.7.0           dichromat_2.0-0         brew_1.0-6              checkmate_2.0.0         s2_1.0.7                yaml_2.2.1              abind_1.4-5             modelr_0.1.8           
##  [49] crosstalk_1.2.0         backports_1.3.0         tools_4.1.1             ellipsis_0.3.2          jquerylib_0.1.4         proxy_0.4-26            sessioninfo_1.2.1       Rcpp_1.0.7              base64enc_0.1-3         classInt_0.4-3          ps_1.6.0                prettyunits_1.1.1       rpart_4.1-15            tmaptools_3.1-1         ggrepel_0.9.1           haven_2.4.3            
##  [65] cluster_2.1.2           fs_1.5.0                leafem_0.1.6            magrittr_2.0.1          reprex_2.0.1            pkgload_1.2.3           hms_1.1.1               evaluate_0.14           jpeg_0.1-9              testthat_3.1.0          compiler_4.1.1          KernSmooth_2.23-20      crayon_1.4.2            htmltools_0.5.2         tzdb_0.2.0              DBI_1.1.1              
##  [81] dbplyr_2.1.1            rappdirs_0.3.3          car_3.0-12              cli_3.1.0               mitools_2.4             parallel_4.1.1          pkgconfig_2.0.3         foreign_0.8-81          terra_1.4-11            xml2_1.3.2              svglite_2.0.0           bslib_0.3.1             webshot_0.5.2           rematch_1.0.1           rvest_1.0.2             snakecase_0.11.0       
##  [97] callr_3.7.0             digest_0.6.28           cellranger_1.1.0        leafsync_0.1.0          htmlTable_2.3.0         curl_4.3.2              satellite_1.0.4         rjson_0.2.20            lifecycle_1.0.1         jsonlite_1.7.2          carData_3.0-4           desc_1.4.0              fansi_0.5.0             pillar_1.6.4            fastmap_1.1.0           httr_1.4.2             
## [113] pkgbuild_1.2.0          glue_1.5.0              remotes_2.4.1           png_0.1-7               leaflet.providers_1.9.0 bit_4.0.4               class_7.3-19            stringi_1.7.5           sass_0.4.0              blob_1.2.2              latticeExtra_0.6-29     stars_0.5-4             memoise_2.0.0           e1071_1.7-9